Machine listening

Machine listening is a technique using software and hardware to extract meaningful information from audio signals. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems --"software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." [1]

Since audio signals are interpreted by the human ear-brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) stand to benefit from efficient compression schemes, which discard inaudible information in the sound. [2]

References

1) Paris Smaragdis taught computers how to play more life-like music

2) Machine Listening Course Webpage at MIT